Start by loading my libraries:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(rstan)
## Loading required package: StanHeaders
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
##
## Attaching package: 'rstan'
## The following object is masked from 'package:tidyr':
##
## extract
library(bayesplot)
## This is bayesplot version 1.8.1
## - Online documentation and vignettes at mc-stan.org/bayesplot
## - bayesplot theme set to bayesplot::theme_default()
## * Does _not_ affect other ggplot2 plots
## * See ?bayesplot_theme_set for details on theme setting
library(bayesrules)
You can change step 1 to define grids that have different amounts of segmentation. The more discrete steps you have, the ‘clearer’ the picture will be of the posterior.
For each MCMC simulation scenario, sketch what a single chain trace plot might look like for each simulation:
Different Mixing Scenarios
For each simulation scenario below, describe how the scenario could impact the posterior approximation:
This could mean that you don’t get sufficiently close to the true posterior in the amount of iterations you run so the posterior you get at the end of the iterations will be inaccurate
It produces strong trends in the posterior making the model behave less and like independent samples. This shows larger error in the posterior approximation.
This will over sample values at a particular position of posterior values and produce erroneous spikes in the posterior approximation.
Your friend missed class this week and they are allergic to reading textbooks. You decide to answer their questions:
This will give you some ideas if there are potential flags that mean your posterior is not a good approximation of the true posterior.
When the theta values become too complex to feasibly get a posterior model, MCMC allows you to run iterations that allow you to approximate the posterior
RStan allows us to define a Bayesian model structure and simulate the posterior. It has the ability to run the MCMC algorithm to get an approximate sample based on different distributions specified in the RStan notation. This also produces different ways to visualize the posterior and assess for the accuracy of the model with different diagnostic techniques.
After class I actually feel pretty good about the concepts in the chapter.
Consider the beta binomial model for pi with Y|pi ~ Bin(n,pi) and pi ~ Beta(3,8). Suppose that in n = 10 independent trials you observe Y = 2 successes. The calculated posterior is Beta(5,16)
#step 1: discrete pieces
grid_data <- data.frame(pi_grid = seq(from = 0, to = 1, length = 5))
#step 2: evaluate the prior and likelihood
grid_data <- grid_data |>
mutate(prior = dbeta(pi_grid, 3, 8),
likelihood = dbinom(2,10,pi_grid))
#step 3: calculate the products of prior and likelihoods
grid_data <- grid_data |>
mutate(unnormalized = likelihood * prior,
posterior = unnormalized / sum(unnormalized))
#confirm that the posterior sums to 1
grid_data |>
summarize(sum(unnormalized),sum(posterior))
## sum(unnormalized) sum(posterior)
## 1 0.8765603 1
#step 4: draw a sample
set.seed(369)
post_sample <- sample_n(grid_data,size=10000,weight=posterior, replace=TRUE)
post_sample |>
tabyl(pi_grid) |>
adorn_totals("row")
## pi_grid n percent
## 0.25 9643 0.9643
## 0.5 357 0.0357
## Total 10000 1.0000
ggplot(post_sample,aes(x=pi_grid)) +
geom_histogram(aes(y=..density..),color="white") +
stat_function(fun=dbeta,args=list(5,16)) +
lims(x=c(0,1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).
#step 1: discrete pieces
grid_data <- data.frame(pi_grid = seq(from = 0, to = 1, length = 201))
#step 2: evaluate the prior and likelihood
grid_data <- grid_data |>
mutate(prior = dbeta(pi_grid, 3, 8),
likelihood = dbinom(2,10,pi_grid))
#step 3: calculate the products of prior and likelihoods
grid_data <- grid_data |>
mutate(unnormalized = likelihood * prior,
posterior = unnormalized / sum(unnormalized))
#confirm that the posterior sums to 1
grid_data |>
summarize(sum(unnormalized),sum(posterior))
## sum(unnormalized) sum(posterior)
## 1 41.79567 1
#step 4: draw a sample
set.seed(369)
post_sample <- sample_n(grid_data,size=10000,weight=posterior, replace=TRUE)
post_sample |>
tabyl(pi_grid) |>
adorn_totals("row")
## pi_grid n percent
## 0.015 2 0.0002
## 0.02 1 0.0001
## 0.025 3 0.0003
## 0.03 1 0.0001
## 0.035 4 0.0004
## 0.04 3 0.0003
## 0.045 6 0.0006
## 0.05 15 0.0015
## 0.055 20 0.0020
## 0.06 15 0.0015
## 0.065 25 0.0025
## 0.07 32 0.0032
## 0.075 32 0.0032
## 0.08 45 0.0045
## 0.085 55 0.0055
## 0.09 65 0.0065
## 0.095 54 0.0054
## 0.1 74 0.0074
## 0.105 92 0.0092
## 0.11 97 0.0097
## 0.115 115 0.0115
## 0.12 107 0.0107
## 0.125 140 0.0140
## 0.13 133 0.0133
## 0.135 144 0.0144
## 0.14 154 0.0154
## 0.145 164 0.0164
## 0.15 175 0.0175
## 0.155 200 0.0200
## 0.16 171 0.0171
## 0.165 193 0.0193
## 0.17 198 0.0198
## 0.175 197 0.0197
## 0.18 223 0.0223
## 0.185 194 0.0194
## 0.19 217 0.0217
## 0.195 216 0.0216
## 0.2 213 0.0213
## 0.205 238 0.0238
## 0.21 237 0.0237
## 0.215 249 0.0249
## 0.22 228 0.0228
## 0.225 243 0.0243
## 0.23 202 0.0202
## 0.235 223 0.0223
## 0.24 204 0.0204
## 0.245 197 0.0197
## 0.25 188 0.0188
## 0.255 197 0.0197
## 0.26 150 0.0150
## 0.265 188 0.0188
## 0.27 173 0.0173
## 0.275 180 0.0180
## 0.28 154 0.0154
## 0.285 181 0.0181
## 0.29 186 0.0186
## 0.295 149 0.0149
## 0.3 147 0.0147
## 0.305 137 0.0137
## 0.31 133 0.0133
## 0.315 156 0.0156
## 0.32 133 0.0133
## 0.325 126 0.0126
## 0.33 99 0.0099
## 0.335 96 0.0096
## 0.34 94 0.0094
## 0.345 98 0.0098
## 0.35 89 0.0089
## 0.355 82 0.0082
## 0.36 96 0.0096
## 0.365 89 0.0089
## 0.37 67 0.0067
## 0.375 63 0.0063
## 0.38 69 0.0069
## 0.385 64 0.0064
## 0.39 58 0.0058
## 0.395 36 0.0036
## 0.4 44 0.0044
## 0.405 48 0.0048
## 0.41 33 0.0033
## 0.415 27 0.0027
## 0.42 33 0.0033
## 0.425 26 0.0026
## 0.43 22 0.0022
## 0.435 29 0.0029
## 0.44 30 0.0030
## 0.445 18 0.0018
## 0.45 24 0.0024
## 0.455 23 0.0023
## 0.46 14 0.0014
## 0.465 14 0.0014
## 0.47 12 0.0012
## 0.475 16 0.0016
## 0.48 6 0.0006
## 0.485 15 0.0015
## 0.49 8 0.0008
## 0.495 7 0.0007
## 0.5 8 0.0008
## 0.505 2 0.0002
## 0.51 7 0.0007
## 0.515 7 0.0007
## 0.52 1 0.0001
## 0.525 3 0.0003
## 0.53 2 0.0002
## 0.535 5 0.0005
## 0.54 4 0.0004
## 0.545 3 0.0003
## 0.55 1 0.0001
## 0.555 2 0.0002
## 0.56 1 0.0001
## 0.565 1 0.0001
## 0.57 2 0.0002
## 0.575 1 0.0001
## 0.58 2 0.0002
## 0.585 1 0.0001
## 0.59 1 0.0001
## 0.61 2 0.0002
## 0.635 1 0.0001
## Total 10000 1.0000
ggplot(post_sample,aes(x=pi_grid)) +
geom_histogram(aes(y=..density..),color="white") +
stat_function(fun=dbeta,args=list(5,16)) +
lims(x=c(0,1))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).
Consider the gamma-poisson model for Yi | lambda ~ Pois(lambda) and lambda ~ Gamma(20,5). If you observe n = 3 independent data points (Y1,Y2,Y3) = (0,1,0)
#step 1: from 0 to 8 equally spaced
grid_data <- data.frame(lambda_grid = seq(from=0,to=8,length=9))
#step 2: using the gamma vales 20,5 and the values for Y in each position for dpois
grid_data <- grid_data |>
mutate(prior=dgamma(lambda_grid,20,5),
likelihood=dpois(0,lambda_grid) * dpois(1,lambda_grid) * dpois(0,lambda_grid))
#step 3: normalize the posterior
grid_data <- grid_data |>
mutate(unnormalized=likelihood*prior,
posterior=unnormalized/sum(unnormalized))
#step 4: draw a sample
set.seed(369)
post_sample <- sample_n(grid_data,size=10000,
weight=posterior,replace=TRUE)
Then we can see the graph of the posterior:
ggplot(post_sample,aes(x=lambda_grid)) +
geom_histogram(aes(y=..density..),color="white") +
stat_function(fun=dgamma,args=list(20,5))+
lims(x=c(0,8))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).
grid_data <- data.frame(lambda_grid = seq(from=0,to=8,length=201))
grid_data <- grid_data |>
mutate(prior=dgamma(lambda_grid,20,5),
likelihood=dpois(0,lambda_grid) * dpois(1,lambda_grid) * dpois(0,lambda_grid))
grid_data <- grid_data |>
mutate(unnormalized=likelihood*prior,
posterior=unnormalized/sum(unnormalized))
set.seed(369)
post_sample <- sample_n(grid_data,size=10000,
weight=posterior,replace=TRUE)
ggplot(post_sample,aes(x=lambda_grid)) +
geom_histogram(aes(y=..density..),color="white") +
stat_function(fun=dgamma,args=list(20,5))+
lims(x=c(0,8))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 2 rows containing missing values (geom_bar).
Consider the normal-normal model for mu with Yi | mu ~N(mu,1.3^2) and mu ~N(10,1.2^2). Suppose that on n = 4 independent observations you observe: (Y1,Y2,Y3,Y4) = (7.1,8.9,8.4,8.6)
#step 1: define a grid of values
grid_data <- data.frame(mu_grid = seq(from = 5, to = 15, length = 11))
#step 2: evaluate the prior and likelihoods using the prior values 10 and 1.2 and the observation model with sd = 1.3
grid_data <- grid_data |>
mutate(prior = dnorm(mu_grid,mean=10,sd=1.2),
likelihood = dnorm(7.1,mean=mu_grid,sd=1.3)*
dnorm(8.9,mean=mu_grid,sd=1.3)*
dnorm(8.4,mean=mu_grid,sd=1.3)*
dnorm(8.6,mean=mu_grid,sd=1.3))
#step 3: approximate the posterior
grid_data <- grid_data |>
mutate(unnormalized = likelihood * prior,
posterior = unnormalized / sum(unnormalized))
#plot it out to see what's what
ggplot(grid_data,aes(x=mu_grid,y=posterior)) +
geom_point() +
geom_segment(aes(x=mu_grid,xend=mu_grid,y=0,yend=posterior))
This is showing that the highest density is around 8 and 9
#step 1: define a grid of values
grid_data <- data.frame(mu_grid = seq(from = 5, to = 15, length = 201))
#step 2: evaluate the prior and likelihoods using the prior values 10 and 1.2 and the observation model with sd = 1.3
grid_data <- grid_data |>
mutate(prior = dnorm(mu_grid,mean=10,sd=1.2),
likelihood = dnorm(7.1,mean=mu_grid,sd=1.3)*
dnorm(8.9,mean=mu_grid,sd=1.3)*
dnorm(8.4,mean=mu_grid,sd=1.3)*
dnorm(8.6,mean=mu_grid,sd=1.3))
#step 3: approximate the posterior
grid_data <- grid_data |>
mutate(unnormalized = likelihood * prior,
posterior = unnormalized / sum(unnormalized))
#plot it out to see what's what
ggplot(grid_data,aes(x=mu_grid,y=posterior)) +
geom_point() +
geom_segment(aes(x=mu_grid,xend=mu_grid,y=0,yend=posterior))
With more intervals we can see that the graph has higher density between 8 and 9.
This is like the Michelle polling example, when you want to get a posterior that involves multiple nested parameters (e.g., voting patterns in multiple states.) Another example would be if you wanted to look at win probabilities across a football league looking at multiple teams with many factors influencing wins for each individual team.
Grid approximation is dependent on the size of the grid. The finer the grid, the better the approximation. However, if you have two dimensions as opposed to one, the amount of grids values you have to check increases to the power of 2. If you have three dimensions it increases to the power of 3. This quickly gets out of control.
Both MCMC and grid approximation rely on guess and check which can get more complicated the more complex the model and has room for error in the size of the grid established or the amount of iterations.
They both allow you to approximate posteriors from priors with more complex parameters that can’t be analytically derived. They both produce a sample of N theta values.
Grid approximation takes an independent sample of theta from a discrete approximation of the posterior.
MCMC has more flexibility for more complicated models and runs a number of chains to increase the the power of approximations.
Below are examples of chains for different probability parameters theta. For each example, determine whether the given chain is a Markov chain. Explain.
This is an MCMC chain because it takes data from actions days past (going to eat Thai) and tries to decide if you will make the same action based on the past day’s action.
This is not an MCMC chain. The action you can use data on is the probability of playing the lottery from the previous day. The probability that you win isn’t assess based on previous playing experience.
This is not an MCMC chain. You have no previous data on the outcomes of the games, the action you have is that you played.
Use the given info to define the Bayesian model structure using the correct RStan syntax. I’ll take observations as 10 which matters for the data.
bin_model = '
data {
int<lower = 0, upper = **number of observations**> Y;
}
parameters {
real<lower = 0, upper = 1>pi;
}
model {
Y ~ binomial(20,pi);
pi ~ beta(1,1);
}
'
gamma_model ='
data {
int<lower = 0> Y[**number of events**];
}
parameters {
real<lower = 0>lambda;
}
model {
Y ~ poisson(lambda);
lambda ~ gamma(4,2);
}
'
normal_model = '
data {
vector[**this is the size of the data vector**] Y;
}
parameters {
real mu;
}
model {
Y ~ normal(mu, 1.3);
mu ~ normal(10,1.2);
}
'
Use the given info to (1) define the Bayesian model structure, and (2) simulate the posterior using correct RStan syntax:
#define the model
bb_model <- "
data {
int<lower = 0, upper = **number of observations**> Y;
}
parameters {
real<lower = 0, upper = 1>pi;
}
model {
Y ~ binomial(20,pi);
pi ~ beta(1,1);
}
"
#simulate the posterior
bb_sim <- stan(model_code = bb_model,data=list(Y=12),chains=4,iter=5000*2,seed=369)
#define the model
gp_model = "
data {
int<lower = 0> Y[**number of events**];
}
parameters {
real<lower = 0>lambda;
}
model {
Y ~ poisson(lambda);
lambda ~ gamma(4,2);
}
"
gp_sim <- stan(model_code = gp_model, data=list(Y=c(3,**number of trials**)),
chains=4,iter=5000*2,seed=369)
#define the model
normal_model = '
data {
vector[**this is the size of the data vector**] Y;
}
parameters {
real mu;
}
model {
Y ~ normal(mu, 1.3);
mu ~ normal(10,1.2);
}
'
#simulate the posterior
d <- list(Y = c(7.1,8.9,8.4,8.6))
nm_sim <- stan(model_code = normal_model, data = d, chains = **number**, iter = **number * 2 **, seed = 369)
Consider the beta-binomial model for pi with Y|pi ~ Bin(n,pi) and pi ~ Beta(3,8). Suppose that in n = 10 independent trials, you observe Y = 2 successes.
#step 1: define the model
bb_model <- "
data {
int<lower = 0, upper = 10> Y;
}
parameters {
real<lower = 0, upper = 1>pi;
}
model {
Y ~ binomial(10,pi);
pi ~ beta(3,8);
}
"
#Step 2: simulate the posterior
bb_sim <- stan(model_code = bb_model,data=list(Y=2),chains=3,iter=6000*2,seed=369)
mcmc_trace(bb_sim,pars="pi",size=0.1)
The range of values is 6000. This is because the model “burns” the first half of the values and starts to use the second half where the behavior of the chain is theoretically more stable.
#density plot of markov chain values
mcmc_dens(bb_sim,pars="pi") +
yaxis_text(TRUE) +
ylab("density")
Taking the formula for updating the Beta with data values, the posterior we would get from a prior of Beta(3,8) and Y = 2, n = 10 would be a posterior of Beta(5,16). This would produce a mean around 5 / 5 + 16 = 0.25 and a mode of around 5 - 1 / 5 + 16 - 1 = 0.2. This looks pretty close to the visual representation we get above!
Repeat the above exercise for a Beta-Binomial model with Y|pi ~ Bin(n,pi) and pi ~ Beta(4,3) where Y = 4 successes in n = 12 independent trials:
#step 1: define the model
bb_model <- "
data {
int<lower = 0, upper = 12> Y;
}
parameters {
real<lower = 0, upper = 1>pi;
}
model {
Y ~ binomial(12,pi);
pi ~ beta(4,3);
}
"
#Step 2: simulate the posterior
bb_sim <- stan(model_code = bb_model,data=list(Y=4),chains=3,iter=6000*2,seed=369)
mcmc_trace(bb_sim,pars="pi",size=0.1)
Range is again 6000 because we burn the first half of our 12000 iterations
Density plot
#density plot of markov chain values
mcmc_dens(bb_sim,pars="pi") +
yaxis_text(TRUE) +
ylab("density")
From prior Beta(4,3), Y = 4, n = 12 –> posterior Beta(8,11)
Mean = 8 / 8+11 = 0.421
Mode = 8 - 1 / 8 + 11 - 1 = 0.389
This looks like it stacks up!
Consider the Gamma-Poisson model with Yi | lambda ~ Pois(lambda) and lambda ~ Gamma(20,5) and you observe n = 3 independent data points (Y1,Y2,Y3) = (0,1,0).
#step 1: define the model
gp_model <- "
data {
int<lower = 0> Y[3];
}
parameters {
real<lower = 0>lambda;
}
model {
Y ~ poisson(lambda);
lambda ~ gamma(20,5);
}
"
#step 2: simulate the posterior
gp_sim <- stan(model_code = gp_model, data=list(Y=c(0,1,0)),
chains=4,iter=5000*2,seed=369)
#trace plots of the 4 markov chains
mcmc_trace(gp_sim,pars="lambda",size=0.1)
#density plot of the markov chain values
mcmc_dens(gp_sim,pars="lambda") +
yaxis_text(TRUE) +
ylab("density")
I would guess around 2.7
From chapter 5, I’m pulling the formula that the posterior gamma is: Gamma (s + sum(yi), r + n). The mean is: \(\frac{s}{r}\) and the mode is \(\frac{s - 1}{r} \text{ for s > 1}\)
The prior for this model had s = 20 and r = 5 and the sum of yi is 1 with n = 3.
s_posterior = s + sum(yi) = 20 + 1 = 21 r_posterior = r + n = 5 + 3 = 8
Posterior Gamma(21,8)
Mean = 21 / 8 = 2.635
Mode = s -1 / r = 20 - 1 / 8 = 2.375
These values feel consistent with the density plot we produced with MCMC.
Do 6.15 again with lambda ~ Gamma(5,5) prior model
#step 1: define the model
gp_model <- "
data {
int<lower = 0> Y[3];
}
parameters {
real<lower = 0>lambda;
}
model {
Y ~ poisson(lambda);
lambda ~ gamma(5,5);
}
"
#step 2: simulate the posterior
gp_sim <- stan(model_code = gp_model, data=list(Y=c(0,1,0)),
chains=4,iter=5000*2,seed=369)
#trace plots of the 4 markov chains
mcmc_trace(gp_sim,pars="lambda",size=0.1)
#density plot of the markov chain values
mcmc_dens(gp_sim,pars="lambda") +
yaxis_text(TRUE) +
ylab("density")
The plausible value of lambda is likely 0.6
Specify the posterior model and compare:
Gamma (s + sum(yi), r + n). The mean is: \(\frac{s}{r}\) and the mode is \(\frac{s - 1}{r} \text{ for s > 1}\)
The prior for this model had s = 5 and r = 5 and the sum of yi is 1 with n = 3.
s_posterior = 5 + 1 = 6 r_posterior = 5 + 3 = 8
Posterior Gamma(6,8)
Mean = 6/8 = 0.75
Mode = 6 - 1 / 8 = 0.65
This feels consistent with the graph produced by MCMC.
Repeat 6.15 for the Normal-Normal with Yi|mu ~ N(mu,1.3^2) and mu ~ (10,1.2^2). Suppose that on n = 4 observations, you observe: (Y1,Y2,Y3,Y4) = (7.1,8.9,8.4,8.6)
#define the model
normal_model = '
data {
vector[4] Y;
}
parameters {
real mu;
}
model {
Y ~ normal(mu, 1.3);
mu ~ normal(10,1.2);
}
'
#simulate the posterior
d <- list(Y = c(7.1,8.9,8.4,8.6))
nm_sim <- stan(model_code = normal_model, data = d, chains = 4, iter=5000*2,seed=369 )
#trace plots of the 4 markov chains
mcmc_trace(nm_sim,pars="mu",size=0.1)
#density plot of the markov chain values
mcmc_dens(nm_sim,pars="mu") +
yaxis_text(TRUE) +
ylab("density")
This appears to be a bit under 9
Taking from chapter 5, the posterior values of a normal distribution is: \(N(\theta\frac{\sigma^2}{n\tau^2 + \sigma^2} + y\frac{n\tau^2}{n\tau^2 + \sigma^2},\frac{\tau^2\sigma^2}{n\tau^2 + \sigma^2})\)
The mean and mode are mu
Prior and data values of interest are:
sigma^2_prior = 1.3^2 theta_prior = 10 tau^2_prior = 1.2^2
n = 4 y_mean = 7.1 + 8.9 + 8.4 + 8.6 / 4 = 8.25
mu_posterior = theta_prior * (sigma^2 / ntau^2 + sigma^2) + y_mean (ntau^2 / ntau^2 + sigma^2)
sig_posterior = (tau^2 * sigma^2) / (n*tau^2 + sigma^2)
sig_pr <- 1.3^2
theta <- 10
tau_pr <- 1.2^2
n <- 4
y_mean <- 8.25
mu_post <- (theta * (sig_pr / (n*tau_pr + sig_pr))) + (y_mean * (n*tau_pr / (n*tau_pr + sig_pr)))
sig_post = (tau_pr * sig_pr) / ((n*tau_pr) + sig_pr)
mu_post
## [1] 8.64698
sig_post
## [1] 0.3266577
This is close to my guess of around 9 for mu.
Repeat 6.17 with Yi|mu ~ N(mu, 8^2) and mu ~ N(-14,2^2). Suppose that on n = 5 independent observations you observe data: (Y1,Y2,Y3,Y4,Y5) = (-10.1,5.5,0.1,-1.4,11.5)
#define the model
normal_model_2 = '
data {
vector[5] Y;
}
parameters {
real mu;
}
model {
Y ~ normal(mu, 8);
mu ~ normal(-14,2);
}
'
#simulate the posterior
d <- list(Y = c(-10.1,5.5,0.1,-1.4,11.5))
nm_sim_2 <- stan(model_code = normal_model_2, data = d, chains = 4, iter=5000*2,seed=369 )
##
## SAMPLING FOR MODEL '1f1e8ae7e4a8d2fbd986043f4b98d2de' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 9e-06 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 10000 [ 0%] (Warmup)
## Chain 1: Iteration: 1000 / 10000 [ 10%] (Warmup)
## Chain 1: Iteration: 2000 / 10000 [ 20%] (Warmup)
## Chain 1: Iteration: 3000 / 10000 [ 30%] (Warmup)
## Chain 1: Iteration: 4000 / 10000 [ 40%] (Warmup)
## Chain 1: Iteration: 5000 / 10000 [ 50%] (Warmup)
## Chain 1: Iteration: 5001 / 10000 [ 50%] (Sampling)
## Chain 1: Iteration: 6000 / 10000 [ 60%] (Sampling)
## Chain 1: Iteration: 7000 / 10000 [ 70%] (Sampling)
## Chain 1: Iteration: 8000 / 10000 [ 80%] (Sampling)
## Chain 1: Iteration: 9000 / 10000 [ 90%] (Sampling)
## Chain 1: Iteration: 10000 / 10000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.016346 seconds (Warm-up)
## Chain 1: 0.016642 seconds (Sampling)
## Chain 1: 0.032988 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL '1f1e8ae7e4a8d2fbd986043f4b98d2de' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 2e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 10000 [ 0%] (Warmup)
## Chain 2: Iteration: 1000 / 10000 [ 10%] (Warmup)
## Chain 2: Iteration: 2000 / 10000 [ 20%] (Warmup)
## Chain 2: Iteration: 3000 / 10000 [ 30%] (Warmup)
## Chain 2: Iteration: 4000 / 10000 [ 40%] (Warmup)
## Chain 2: Iteration: 5000 / 10000 [ 50%] (Warmup)
## Chain 2: Iteration: 5001 / 10000 [ 50%] (Sampling)
## Chain 2: Iteration: 6000 / 10000 [ 60%] (Sampling)
## Chain 2: Iteration: 7000 / 10000 [ 70%] (Sampling)
## Chain 2: Iteration: 8000 / 10000 [ 80%] (Sampling)
## Chain 2: Iteration: 9000 / 10000 [ 90%] (Sampling)
## Chain 2: Iteration: 10000 / 10000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.016504 seconds (Warm-up)
## Chain 2: 0.017021 seconds (Sampling)
## Chain 2: 0.033525 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL '1f1e8ae7e4a8d2fbd986043f4b98d2de' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 2e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 10000 [ 0%] (Warmup)
## Chain 3: Iteration: 1000 / 10000 [ 10%] (Warmup)
## Chain 3: Iteration: 2000 / 10000 [ 20%] (Warmup)
## Chain 3: Iteration: 3000 / 10000 [ 30%] (Warmup)
## Chain 3: Iteration: 4000 / 10000 [ 40%] (Warmup)
## Chain 3: Iteration: 5000 / 10000 [ 50%] (Warmup)
## Chain 3: Iteration: 5001 / 10000 [ 50%] (Sampling)
## Chain 3: Iteration: 6000 / 10000 [ 60%] (Sampling)
## Chain 3: Iteration: 7000 / 10000 [ 70%] (Sampling)
## Chain 3: Iteration: 8000 / 10000 [ 80%] (Sampling)
## Chain 3: Iteration: 9000 / 10000 [ 90%] (Sampling)
## Chain 3: Iteration: 10000 / 10000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.016529 seconds (Warm-up)
## Chain 3: 0.018045 seconds (Sampling)
## Chain 3: 0.034574 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL '1f1e8ae7e4a8d2fbd986043f4b98d2de' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 1e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 10000 [ 0%] (Warmup)
## Chain 4: Iteration: 1000 / 10000 [ 10%] (Warmup)
## Chain 4: Iteration: 2000 / 10000 [ 20%] (Warmup)
## Chain 4: Iteration: 3000 / 10000 [ 30%] (Warmup)
## Chain 4: Iteration: 4000 / 10000 [ 40%] (Warmup)
## Chain 4: Iteration: 5000 / 10000 [ 50%] (Warmup)
## Chain 4: Iteration: 5001 / 10000 [ 50%] (Sampling)
## Chain 4: Iteration: 6000 / 10000 [ 60%] (Sampling)
## Chain 4: Iteration: 7000 / 10000 [ 70%] (Sampling)
## Chain 4: Iteration: 8000 / 10000 [ 80%] (Sampling)
## Chain 4: Iteration: 9000 / 10000 [ 90%] (Sampling)
## Chain 4: Iteration: 10000 / 10000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.016261 seconds (Warm-up)
## Chain 4: 0.016838 seconds (Sampling)
## Chain 4: 0.033099 seconds (Total)
## Chain 4:
#trace plots of the 4 markov chains
mcmc_trace(nm_sim_2,pars="mu",size=0.1)
#density plot of the markov chain values
mcmc_dens(nm_sim_2,pars="mu") +
yaxis_text(TRUE) +
ylab("density")
This shows mu is a little below -10, let’s say -10.5
Using the same approach as the above.
The y_mean = -10.1 + 5.5 + 0.1 + -1.4 + 11.5 / 5 = 1.12 n = 5
Yi|mu ~ N(mu, 8^2) and mu ~ N(-14,2^2)
sig_pr <- 8^2
theta <- -14
tau_pr <- 2^2
n <- 5
y_mean <- 1.12
mu_post <- (theta * (sig_pr / (n*tau_pr + sig_pr))) + (y_mean * (n*tau_pr / (n*tau_pr + sig_pr)))
sig_post = (tau_pr * sig_pr) / ((n*tau_pr) + sig_pr)
mu_post
## [1] -10.4
sig_post
## [1] 3.047619
This is really spot on for the mu!! Wow, I’m so good at reading graphs!!